首页> 外文OA文献 >Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes
【2h】

Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes

机译:不确定性下的决策:基于部分可观察的马尔可夫决策过程的神经模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

A fundamental problem faced by animals is learning to select actions based on noisy sensory information and incomplete knowledge of the world. It has been suggested that the brain engages in Bayesian inference during perception but how such probabilistic representations are used to select actions has remained unclear. Here we propose a neural model of action selection and decision making based on the theory of partially observable Markov decision processes (POMDPs). Actions are selected based not on a single “optimal” estimate of state but on the posterior distribution over states (the “belief” state). We show how such a model provides a unified framework for explaining experimental results in decision making that involve both information gathering and overt actions. The model utilizes temporal difference (TD) learning for maximizing expected reward. The resulting neural architecture posits an active role for the neocortex in belief computation while ascribing a role to the basal ganglia in belief representation, value computation, and action selection. When applied to the random dots motion discrimination task, model neurons representing belief exhibit responses similar to those of LIP neurons in primate neocortex. The appropriate threshold for switching from information gathering to overt actions emerges naturally during reward maximization. Additionally, the time course of reward prediction error in the model shares similarities with dopaminergic responses in the basal ganglia during the random dots task. For tasks with a deadline, the model learns a decision making strategy that changes with elapsed time, predicting a collapsing decision threshold consistent with some experimental studies. The model provides a new framework for understanding neural decision making and suggests an important role for interactions between the neocortex and the basal ganglia in learning the mapping between probabilistic sensory representations and actions that maximize rewards.
机译:动物面临的一个基本问题是学习根据嘈杂的感官信息和对世界的不完全了解来选择行动。已经提出大脑在感知期间参与贝叶斯推理,但是如何使用这种概率表示来选择动作仍不清楚。在这里,我们基于部分可观察的马尔可夫决策过程(POMDPs)的理论,提出了一种行动选择和决策的神经模型。不是根据状态的单一“最佳”估计值来选择动作,而是根据状态(“信仰”状态)的后验分布来选择动作。我们展示了这种模型如何为解释涉及信息收集和公开行动的决策制定实验结果提供了一个统一的框架。该模型利用时差(TD)学习来最大化预期奖励。最终的神经体系结构在信念计算中为新皮层发挥了积极作用,而在信念表示,价值计算和行动选择中将基底神经节发挥了作用。当应用于随机点运动判别任务时,代表信念的模型神经元表现出与灵长类新皮层中LIP神经元相似的响应。从信息收集转换为公开行动的适当阈值在奖励最大化时自然而然地出现。此外,在随机点任务期间,模型中奖励预测误差的时间过程与基底神经节中的多巴胺能反应具有相似性。对于有期限的任务,该模型学习一种决策策略,该策略会随着时间的流逝而变化,并预测与某些实验研究相一致的崩溃决策阈值。该模型为理解神经决策提供了新的框架,并提出了新皮层和基底神经节之间相互作用的重要作用,以学习概率感觉表征与最大化回报的行为之间的映射。

著录项

  • 作者

    Rao, Rajesh P. N.;

  • 作者单位
  • 年度 2010
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号